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ABSTRACT 

An  experimental  software  tool  for  simulating  the  behavior  of 
distributed  algorithms  is  proposed.  The  primary  motivation  for  developing 
the  tool  is  to  study  distributed  database  algorithms.  Also,  a  classifica¬ 
tion  of  techniques  presently  used  for  distributed  database  problems  of 
ooncurrenoy  control  and  reoovery  is  presented.  This  classification  will  be 
used  to  reduoe  the  experimentation  necessary  to  compare  the  performance  of 
alternative  algorithms. 

The  study  and  development  of  distributed  algorithms  in  general  and 
distributed  database  algorithms  in  particular  is  behavior  of  distributed 
systems.  Both  Intuition  and  present-day  analytical  tools  are  inadequate  to 
characterize  their  behavior.  Another  barrier  to  understanding  such 
algorithms  is  the  complexity  of  their  interaction,  due  to  the  potential 
lack  of  synchronization  between  nodes  of  a  distributed  system.  Finally,  it 
is  not  yet  olear  what  "good"  behaviors  are  reasonable  to  expeot  from  a 
distributed  system.  As  a  result,  a  multitude  of  algorithms  may  exist  for 
solving  a  single  problem,  but  without  more  experience  and  analysis,  their 
behavior  oannot  be  well  understood  or  compared. 

This  report  describes  an  approaoh  to  providing  the  experience  neces¬ 
sary  for  understanding  the  behavior  of  these  algorithms. 
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CHAPTER  1 
Introduction 


1.1  T^a  Prohlaa 

Tha  basio  problan  to  ba  addressed  by  this  researoh  projeot  is  the 
development  of  a  methodology  for  analysing  and  ooaparing  distriouted 
database  system  design  alternatives.  This  problem  is  both  general  and 
speoifio.  In  general,  we  may  ask  whether  there  are  rules  or  guidelines  for 
choosing  one  database  design  alternative  over  another.  For  speoifio 
databases,  we  may  ask  whioh  design  alternative  works  beat  aooording  to  the 
requirements  of  the  database.  The  approaoh  taken  addresses  both  questions, 
in  that  studies  will  be  done  to  determine  the  general  guidelines,  but  the 
tool  developed  for  those  studies  will  also  be  usable  in  designing  speoifio 
databases.  The  design  alternatives  to  be  addressed  by  the  studies  in  this 
project  are  the  ohoice  of  the  following  algorithms:  oonourrenoy  oontrol, 
reliability,  and  query  processing.  These  algorithms  have  been  chosen 
because  of  their  central  importance  to  database  processing  and  also  beoause 
a  number  of  alternative  algorithms  have  already  been  developed  for  each 
problem. 

The  difficulties  of  studying  any  distributed  database  algorithms  are 
numerous.  First,  only  a  few  of  the  proposed  alternatives  have  been 
implemented  at  any  single  site.  Thus  there  is  little  experience  with  their 
performance  in  general.  As  a  result  intuition  about  their  behavior  is 
unreliable.  This  makes  it  very  difficult  even  to  develop  reasonable 
hypotheses  about  their  behavior.  Second,  the  behavior  of  a  distributed 
system  is  much  more  complex  than  the  behavior  of  a  centralized  system.  It 
is  necessary  to  consider  not  only  the  behavior  of  a  single  system  in 
isolation,  but  also  its  interactions  with  the  other  nodes  of  the  system. 
For  this  reason,  it  can  be  exceptionally  difficult  to  prove  anything  about 
a  distributed  algorithm,  even  that  it  works  correctly.  Third,  the  alter¬ 
natives  designed  to  solve  a  given  problem  make  different  assumptions  about 
the  system  on  which  they  are  run.  They  may  assume  different  topologies, 
different  protocols,  and  different  process  structures.  Even  correctness 
criteria  may  vary.  Finally,  few  analytical  tools  for  studying  distributed 
systems  have  been  developed  so  far. 
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1.2  Qh.ltfittYIl 

The  objectives  of  this  projeot  ares 

e  development  of  a  software  tool  for  analysing  and  studying  the 
design  alternatives; 

m  application  of  the  tool  to  distributed  database  design  alter¬ 
natives; 

a  development  of  new  solutions  to  distributed  database  problems  using 
the  results  of  the  above  study;  and 

e  development  of  experimental  and  analytical  techniques  for  studying 
distributed  algorithms  in  general. 

The  first  objective  of  this  projeot  is  the  development  of  an  experimental 
tool  for  the  study  of  distributed  systems)  especially  distributed  database 
systems.  The  oentral  experimental  tool  will  be  a  combination  testbed  and 
simulation  system.  It  will  allow  an  algorithm  to  be  coded  in  as  a  module 
of  the  system.  The  algorithm  can  then  be  tested  in  this  environment.  Sub¬ 
sequently,  the  behavior  of  the  algorithm  oan  be  studied  with  the  aid  of  the 
simulation  facilities  provided  by  the  system. 

The  seoond  objective  is  to  apply  the  tool  to  a  study  of  distributed 
databases.  The  goal  of  applying  this  experimental  tool  will  be  to 
determine  how  the  structure  of  an  system  relates  to  its  expeoted  behavior. 

The  assumption  is  that  reasonable  structural  properties  will  correspond  to 
good  (or  bad)  behavior  in  a  predictable  way.  For  example,  using  the  clas¬ 
sification  of  oonourrency  oontrol  mechanisms  into  looking  algorithms  and 
timestamping  algorithms,  we  may  ask  whioh  is  more  efficient,  more  robust, 
or  more  fair.  This  should  not  be  taken  to  imply  that  only  this  clas¬ 
sifications  will  be  used.  In  fact,  one  part  of  this  objective  is  to 
determine  whioh  classifications  provide  the  most  information  about 
behavior. 

The  third  objective  is  to  use  the  results  of  the  above  studies  to 
develop  new  solutions  to  distributed  database  problems,  where  it  is  clear 
from  the  previous  work  that  existing  solutions  could  be  improved  on. 

The  final  objective  is  to  develop  experimental  and  analytical  tech¬ 
niques  for  studying  distributed  algorithms.  New  techniques  to  be  developed 
obviously  can't  be  predicted,  but  the  tool  itself  provides  one  experimental 
technique  for  studying  distributed  algorithms.  Also,  experience  with  the 
tool  should  suggest  refinements.  In  addition,  the  usefulness  of  various 
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olaasifioatlona  of  dlatrlbutad  databaae  algorithms  (e.g.,  BER80 ,  BAD81, 
HSI81)  will  ba  taated.  Thia  teating  will  auggaat  oonnaotiona  between  the 
classification  of  an  algoritha  and  ita  perforaanoe  that  nay  be  uaed  in 
analyaia. 

1.3  The  Annroaah 

The  approaoh  will  inolude  the  following  atepa: 

•  development  of  a  general  model  of  diatributed  databaae  prooeaaing; 

•  development  of  the  teatbed/aimu] ation  model; 

•  validation  of  the  oorrootneaa  of  the  ayatem  with  eaoh  deaign  alter¬ 
native  to  be  tested; 

•  implementation  of  the  deaign  alternatives  for  oonourrenoy  oontrol 
and  reliability  mechanisms  aa  modules  of  the  ayatem; 

e  simulation  experiments  to  oolleot  empirical  data  about  the  behavior 
of  the  system  with  various  design  alternatives; 

a  development  of  hypotheses,  on  the  baaia  of  the  experimental  data, 
concerning  the  behavior  of  the  diatributed  system  with  various 
types  of  designs;  and 

m  development  of  analytical  proofs  of  these  hypotheses  if  possible. 

The  model  of  diatributed  database  processing  will  be  based  on  that  of  Ber¬ 
nstein  and  Goodman  [BER80].  It  will  be  more  general  in  that  reliability  of 
the  communication  system  will  not  be  assumed;  transaction  managers  and  data 
managers  will  be  allowed  to  oommunioate  with  either  transaction  managers  or 
data  managers;  and  in  fact  a  transaction  may  be  passed  around  to  multiple 
transaction  managers  for  processing,  as  described  in  [HOS78]. 

A  central  deoision  to  be  made  in  the  development  of  the  testbed/ 
simulation  model  is  the  ohoioe  between  a  distributed  simulation  and  a 
centre lized  simulation.  The  advantages  of  distributed  simulation  are  that 
the  testing  feature  will  be  more  convincing  if  the  simulation  system  is 
itself  distributed  and  that  it  will  be  more  efficient  if  the  communication 
system  is  sufficiently  fast.  The  disadvantages  are  increased  hardware  cost 
and  overhead  the  problems  of  dealing  with  time;  and  the  need  to  develop  the 
software  for  it.  Most  of  the  software  for  a  centralized  simulation  has 
been  written  and  tested  on  an  existing  "ticket- sales"  database. 

While  the  number  of  potential  algorithms  to  be  implemented  seems 
prohibitively  large,  two  faotors  reduoe  the  problem  to  manageable  size: 
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first,  the  essential  parts  of  the  algorithms  are  relatively  small  programs, 
and  second,  not  all  algorithms  need  to  be  Implemented,  just  those 
representative  of  important  olasses  of  algorithms.  The  plan  of  attack,  in 
the  area  of  oonourrenoy  oontrol,  is  to  build  on  the  work  of  Bernstein  and 
Goodman  [BER80] ;  Badal  [BAD81 ] ;  and  Hsiao  and  Oasu  [HSI81],  Eaoh  of  these 
papers  contains  a  olassifloatlon  of  oonourrenoy  oontrol  algorithms  by  their 
structural  properties  (e.g.,  voting  or  looking;  oentralised  or 
decentralized).  Suoh  classifications  will  be  used  as  a  starting  point  for 
analyzing  the  behavior  of  the  algorithms. 

For  the  experimental  results  to  be  of  any  use,  the  algorithms  must 
first  be  verified.  Several  techniques  oan  be  applied:  traditional  proof 
techniques,  mutation  analysis  [ACR79],  and  traditional  testing.  Also,  the 
data  supplied  to  the  system  describing  the  data  processing  requirements 
must  be  realistic.  Some  possible  souroes  of  data  for  systems  whloh  are 
either  partially  distributed  or  reasonable  candidates  for  distribution  are 
banks  (e.g.,  automated  teller  systems),  airlines  (tloketing  systems);  and 
the  military  (e.g.,  personnel  and  inventory  systems). 

Some  of  the  measures  of  system  performance  to  be  used  in  analyzing 
the  results  are: 

e  Average  user  waiting  time; 

•  Throughput ; 

e  Average  queue  length  at  each  node;  and 

e  Utilization. 

Other  measures  that  need  to  be  considered,  to  determine  whether  they  are 
reasonable  to  look  at  in  a  distributed  system,  are  fairness,  avoidance  of 
starvation,  blocking,  degree  of  oonourrenoy,  and  so  forth, 

i.k  Slgaiflcaaso 

The  work  done  on  this  projeot  will  contribute  in  a  number  of  ways  to 
the  understanding  of  distributed  database  systems  and  to  the  methodology 
for  designing  them.  First,  the  testbed  and  simulation  tool  will  be  usable 
not  only  for  the  duration  of  this  project  but  will  be  available  for 
additional  work  on  distributed  database  systems.  Furthermore,  it  should  be 
sufficiently  general  to  be  used  for  other  distributed  system  projects  at 
Georgia  Tech.  Seoor.d,  the  tool  will  be  applicable  to  the  design  of 
specific  distributed  database  systems.  The  use  of  the  tool  to  test  the 
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behaviors  of  various  distributed  database  algorithms  will  serve  as  a 
thorough  test  of  Its  cirreotness  and  performance.  Third,  the  study  of 
design  alternative',  for  distributed  database  systems,  using  the  tool,  will 
provide  better  understanding  of  the  range  of  alternatives  which  are 
reasonable  for  any  particular  oase,  and  thus  reduce  the  design  problem. 
Fourth,  improved  understanding  of  the  behav?  of  different  algorithms  for 
concurrency  oontrol,  query  processing,  and  reliability  may  suggest  better 
algorithms.  Finally,  extensive  empirical  studies  of  a  distributed  database 
system  will  provide  experience  on  whloh  to  base  principles  of  behavior  that 
any  reasonable  distributed  database  system  ought  to  obey. 
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Background  Chapter  2 

CHAPTER  2 
Background 


2.1  General  Remarks 

The  two  problems  to  be  studied  are  concurrency  control  and 
reliability.  Solutions  to  these  problems  will  be  interdependent,  since 
reliability  mechanisms  are  required  to  guarantee  that  concurrent  transac¬ 
tions  appear  atomic  to  system  users  in  spite  of  site  failures.  There  are 
also  interactions  between  the  choice  of  a  concurrency  control  algorithm  and 
the  techniques  used  to  provide  a  reliable  system.  For  example,  some 
concurrency  control  algorithms  are  designed  to  continue  functioning  correc¬ 
tly  in  spite  of  site  failures.  Others  require  system  reconfiguration  when 
a  site  fails. 

2.2  Concurrency  Cfllltrfll 

Concurrency  control  in  a  database  (distributed  or  not)  is  a  means  of 
guaranteeing  correct  behavior  while  allowing  maximal  concurrency.  As  an 
example  of  the  problems  that  can  arise  if  uncontrolled  concurrency  is 
allowed,  consider  a  bank  automated  teller  system.  Suppose  that  a 
customer's  balance  is  stored  redundantly  at  each  of  several  locations. 
Then,  with  uncontrolled  concurrency,  a  customer  could  arrange  to  have  with¬ 
drawals  of  the  entire  balance  initiated  simultaneously  at  two  remote  sites; 
but  the  balance  after  these  transactions  would  reflect  only  one  of  the 
withdrawals.  This  would  be  nice  for  the  customer,  but  disastrous  for  the 
bank. 

The  solution  to  this  type  of  problem  is  to  use  a  concurrency  control 
algorithm,  which  prevents  this  type  of  behavior.  The  standard  criterion  of 
correctness  in  a  database  was  developed  by  Eswaran,  Graj,  Lorie,  and 
Traiger  in  [ESW76].  Their  model  of  a  database  includes  eqtities,  each  of 
which  has  a  name  and  a  value,  and  integrity  constraints,  which  may  be 
expressed  as  predicates  and  restrict  the  set  of  values  that  may  be  taken  on 
by  the  entities  in  the  database.  For  example,  in  the  bank  database,  we 
would  require  that  an  entity  representing  a  balance  be  nonnegative  and  that 
any  two  entities  representing  the  sarae  balance  (perhaps  at  different  sites 
of  a  distributed  database)  be  equal  in  value.  A  database  state  which 
satisfies  all  of  the  integrity  constraints  is  a  consistent  database  state. 
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The  unit  of  activity  on  a  database  is  the  transaction.  A  transaction 
consists  of  a  set  of  basic  database  actions,  usually  reads  and  writes.  A 
consistent  transaction  changes  a  consistent  database  state  to  another 
consistent  database  state.  The  database  state  need  not  be  consistent  while 
a  transaction  is  in  progress,  but  it  must  be  consistent  when  it  terminates. 

A  schedule  for  a  set  of  transactions  is  an  ordered  list  of  the 
database  actions  specified  by  the  transactions,  preserving  the  order  within 
individual  transactions.  If  all  database  transactions  are  consistent  when 
run  alone,  then  clearly  any  serial  schedule  of  transactions  (i.e.,  a 
schedule  in  which  each  transaction  terminates  before  the  next  begins)  will 
be  jonsistent.  Thus  in  [ESW76 ]  a  database  is  defined  to  be  serializable  if 
it  can  be  transformed  to  a  serial  schedule  by  successively  interchanging 
database  actions  that  cannot  affect  each  other,  and  it  is  shown  that  any 
serializable  schedule  is  consistent.  Subsequently,  Stearns,  Rosenkrantz, 
and  Lewis  [R0S80]  have  shown  that  serializability  is  not  only  a  sufficient 
but  a  necessary  condition  for  consistency,  if  we  assume  "full  func¬ 
tionality"  (i.e.,  no  restrictions  placed  on  the  interpretation  of  the 
operations  in  a  transaction)  and  all  entities  are  read  before  they  are 
written. 

Concurrency  control  algorithms  are  thus  used  to  enforce 
serializability  of  schedules  of  transactions.  Actually,  one  class  of 
algorithms  (the  timestamp  algorithms)  may  produce  schedules  which  are  not 
strictly  serializable  but  whose  effects  are  exactly  the  same  as  some 
serializable  schedule,  Serializability  of  the  schedules  allowed  is  thus 
the  standard  criterion  of  correctness  of  a  concurrency  control  algorithm. 

Several  authors  [LYN81 , RIE81 ,GAR81 ]  have  proposed  various 
generalizations  of  serializability  as  an  alternative  criterion  for  correct¬ 
ness  of  concurrency  control  algorithms.  Lynch’s  generalization  provides 
for  the  user  (or  application  system)  to  specify  a  set  of  interleavings  of 
actions  which  are  correct.  The  set  may  include  nonserializable  as  well  as 
serializable  interleavings.  Garcia-Molina  proposes  two  levels  of  locking, 
local  and  global.  Local  locking  is  used  to  guarantee  that  a  sequence  of 
actious  is  atomic  at  a  single  site.  Global  locking  is  used  in  the  usual 
way  for  detection  of  concurrency  conflicts.  The  advantage  of  his  method  is 
that  knowledge  of  the  database  semantics  may  be  used  to  allow  a  non-local 
transaction  to  release  local  locks  as  soon  as  its  local  activity  is  com- 
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plete.  For  example,  a  transfer  of  money  from  one  bank  branch  to  another 
may  be  considered  completed  as  it  has  been  determined  that  there  is  enough 
money  in  the  source  branch  to  perform  the  transfer.  Ries  and  Smith  discuss 
"nested  transactions ",  in  which  one  transaction  system  uses  transactions 
provided  by  a  seoond  transaction  system.  The  nested  transactions  may  be 
serializec  with  each  other  in  any  order,  not  necessarily  in  the  same  order 
as  the  calling  transactions.  For  example,  if  two  database  transactions 
request  the  file  system  to  allocate  space,  it  is  not  necessary  to  serialize 
the  space  allocations  in  the  same  order  as  the  database  transactions. 

2.2.1  Concurrency  Contra!  Algorithms 

Bernstein  and  Goodman  categorize  concurrency  control  algorithms  as 
either  twu-phase  locking  algorithms  or  as  timestamping  algorithms  [BER80]. 
Two-phase  locking  algorithms  ensure  consistency  by  prohibiting  a  transac¬ 
tion  from  requesting  more  locks  if  it  has  released  any  locks.  Each 
transaction  has  a  "growing"  phase  during  which  it  requests  locks  and  a 
"shrinking"  phase  during  which  it  releases  the  locks  it  has  set.  Between 
these  two  phases  is  a  "lockpoint";  the  execution  behaves  as  if  all  entities 
v/ere  updated  at  the  lockpoint.  Locking  schemes  are  prone  to  deadlocks  and 
require  a  policy  for  avoiding  or  breaking  them. 

Timestamp  ordering  algorithms  depend  on  assigning  a  unique  time  to 
each  transaction  as  it  arrives,  and  guaranteeing  that  the  effect  of  »  lining 
a  group  of  transactions  is  the  same  as  if  they  had  been  run  serially  in 
arrival  order.  A  transaction  must  not  perform  updates  on  the  basis  of  data 
which  is  out-of-date.  That  is,  it  must  not  overwrite  an  update  created  by 
a  later  transaction.  Also,  it  must  not  read  data  written  by  a  later 
transaction. 

Centralized  concurrency  control  algorithms  are  all  locking  schemes, 
in  which  locks  are  controlled  centrally  and  must  be  requested  from  a 
designated  site.  One  variant  of  this  is  Stonebraker’ s  "primary  copy" 
scheme  for  INGRES  [ST079],  in  whioh  the  site  may  vary  from  one  data  entity 
to  another.  A  decentralized  algorithm  which  utilizes  locking  is  oalled 
"basic  2PL"  by  Bernstein  and  Goodman  [BER80].  In  this  technique,  the  lock 
on  an  entity  is  granted  by  the  site  at  which  it  is  stored.  They  also 
describe  a  technique  called  "voting  2PL",  which  requires  only  that  a 
transaction  obtain  a  majority  of  the  locks  for  each  data  item  it  requires. 
Since  only  one  transaction  at  a  time  can  have  a  majority,  this  is 
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sufficient  to  prevent  consistency  violations. 

Timestamping  approaches  to  concurrency  control  including  voting 
schemes,  a  multi-version  database  algorithms,  and  the  SDD-1  protocols. 
best-known  voting  scheme  is  probably  Thomas*  majority  voting  alymrithu* 
[TH079]  (also  called  the  distributed  voting  algorithm  by  Garcis-Hotina 
[GAR78]),  in  which  a  majority  of  the  sites  must  approve  any  transaction. 
This  idea  has  been  generalized  by  Gifford  [GIF79]  to  allow  assignment  of 
any  number  of  votes  to  eaoh  site,  and  require  only  that  a  majority  of  votes 
be  collected  by  a  transaction.  This  reduces  to  a  centralized  algorithm  if 
one  site  has  all  the  votes.  As  Thomas  noted  in  [TH079l»  any  rule  will  work 
which  requires  that  two  conflicting  transactions  both  get  permission  to 
proceed  from  some  single  site. 

Reed's  multi-version  algorithm  [REE78]  requires  that  multiple  ver¬ 
sions  of  each  entity  be  maintained  in  the  database,  with  each  version 
including  the  range  of  times  for  which  the  value  is  known  to  have  applied. 
Each  action  on  the  database  has  a  time  associated  with  it.  If  it  is  a  read 
and  the  entity  has  a  value  for  some  range  of  times  including  the  read,  then 
the  value  is  returned;  if  no  such  value  exists,  the  range  of  times  for  some 
value  is  increased  to  include  the  time  of  the  read.  If  the  action  is  a 
write,  it  must  not  ohange  a  value  which  already  holds  for  the  time  of  the 
write;  if  it  tries  to,  the  transaction  is  aborted. 

The  SDD-1  protocols  [BER??]  also  utilize  timestamps  to  guarantee 
different  levels  of  synchronization  of  transactions.  The  idea  is  that  many 
groups  of  transactions  will  require  only  limited  synchronization  with 
respect  to  each  other.  To  take  advantage  of  this  fact,  the  transactions 
must  be  analyzed  beforehand  to  determine  the  types  of  synchronization 
required.  Of  course,  this  requires  that  the  transactions  to  be  used  are 
known  beforehand.  Four  types  of  synchronization  are  identified.  PI  synch¬ 
ronization  is  purely  looal;  no  global  synchronization  is  attempted.  P2 
synchronization  can  be  used  to  guarantee  that  reads  are  consistent, 
although  they  may  be  out-of-date.  The  largest  local  entity  timestamp  at 
the  site  initiating  the  transaction  is  used  as  the  time  of  the  read.  P3 
synchronization  guarantees  that  reads  are  up-to-date  as  of  the  current  time 
of  the  transaction;  this  is  used  for  potentially  conflicting  updates.  P*J 
synchronization  is  used  for  unanticipated  transactions  and  for  P2  or  P3 
transactions  requiring  so  many  entities  they  might  be  subject  to  star- 
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vation. 

The  classification  into  looking  and  timestamping  algorithms  refers  to 
the  method  used  to  prevent  consistency  violations.  The  looking  algorithms 
require  that  a  transaction  must  reaoh  a  lookpoint,  when  it  has  exclusive 
control  of  all  data-items,  before  it  may  oomplete.  The  timestamping 
algorithms  require  that  aotions  on  data-items  be  performed  in  timestamp 
order  of  the  transactions  requesting  the  actions.  But  it  is  also  necessary 
to  decide  what  to  do  with  transactions  that  never  reaoh  their  lockpoints 
(due  to  deadlocks)  and  with  transactions  that  disoover  a  timestamp  conflict 
with  other  active  transactions. 

2.2.2  DgaflloQk  MaaagcMBt 

Deadlocks  may  be  handled  either  by  deadlock  detection  or  deadlock 
prevention.  Deadlock  detection  requires,  maintaining  a  graph  of  active 
transactions.  The  nodes  of  the  graph  represent  transactions  and  the  arcs 
represent  the  "waits-for"  relation.  Deadlook  prevention  requires 
guaranteeing  that  no  deadlocks  ever  occur. 

Centralized  deadlock  detection  oould  be  used  with  a  centralized  lock* 
ing  algorithm.  However,  it  would  be  extremely  expensive  with  a 
decentralized  algorithm.  Two  methods  for  decentralized  deadlock  detection 
are  described  in  [MEN78],  One  method  imposes  a  hierarchy  on  the  network 
and  detects  deadlocks  at  the  lowest  possible  node  of  the  tree.  This  method 
was  designed  to  reduce  the  communications  cost  inourred  with  centralized 
deadlock  detection.  The  second  method  requires  recursively  sending 
notification  of  new  "blocking  transactions"  to  the  originating  site  of  each 
transaction  thus  blocked.  This  method  was  des*sr«su  to  continue  functioning 
in  a  system  prone  to  failures. 

If  a  deadlock  prevention  method  is  to  be  used,  one  way  of 
guaranteeing  that  no  deadlocks  oocur  is  to  guarantee  that  locks  are 
assigned  in  the  same  order  to  all  transactions  for  all  entities  referenced 
at  all  sites.  This  can  be  done  by  assi,.,  ling  sequenoe  numbers  to  transac¬ 
tions  and  granting  lock  requests  to  the  lowest  pending  sequence  number. 
This  technique  is  used  in  Garcia-Molina’s  "hole  list"  (MCLA-h)  scheme 
[GAR78,  GAR79].  In  this  scheme,  instead  of  requiring  each  action  on  a 
database  entity  to  wait  at  the  central  site  for  a  look,  a  sequence  number 
is  assigned  to  the  transaction,  and  the  action  proceeds  immediately  to  the 
distributed  sites.  The  "hole  list"  refers  to  a  list  of  sequence  numbers  of 
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transactions  the  sit?f»  need  not  wait  for.  Another  technique  using  sequence 
numbers  is  Lelann's  token-passing  scheme  [LEL78] ,  in  which  the  site  with 
the  "rightw  to  grant  sequence  numbers  is  the  site  havi ng  possession  of  a 
token,  which  is  passed  around  a  ring.  Two  other  loots  ng  algorithms  using 
sequence  numbers  are  the  centralized  WAIT-DIE  and  WOUND-WAIT  algorithms  of 
Rosenkrantz,  Stearns,  and  Lewis  [ROS78]. 

2.2.3  Cgflfllst  ReaolutlQD 

Finally,  with  either  locking  or  timestamping  algorithms,  it  is  neces¬ 
sary  to  decide  how  to  resolve  conflicts  (i.e. ,  deadlocks,  potential  dead¬ 
locks,  and  timestamp  conflicts).  This  can  be  done  by  using  a  sequence  num¬ 
bering  scheme  such  as  the  "valid  numbering  schemes"  described  in  [ROS78]  or 
by  voting,  as  in  [TH0791  and  [GIF79].  Timestamps  qualify  as  a  valid  num¬ 
bering  scheme.  Algorithms  which  avoid  conflicts  by  assigning  a  number  or 
timestamp  to  each  transaction  and  then  forcing  each  transaction  to  wait 
until  all  previous  transactions  have  executed  will  be  classified  as  resolv¬ 
ing  conflict  using  a  numbering  scheme. 

2.3  Reliability  la  a  Dlatrlbuteti  Database 

The  goals  of  a  reliability  meohanism  in  a  distributed  database  system 
are  to  guarantee  that: 

•  the  active  sites  can  continue  to  funotlon  in  the  presence  of 
failure;  and 

•  a  failed  site  can  be  restored  to  the  system  when  the  cause  of  the 
failure  is  corrected. 

The  first  goal,  to  permit  the  system  to  continue  to  function  in  the 
presence  of  failure,  requires  (1)  detection  of  the  failure;  (3)  possible 
reconfiguration  of  the  system  after  the  failure  is  detected;  and  (2) 
preserving  the  atomicity  of  transactions  that  may  be  active  both  before  and 
after  the  failure.  The  second  goal,  restoring  the  failed  site  after  the 
cause  of  the  failure  is  corrected,  requires  (1)  sufficient  information  to 
determine  what  the  current  state  of  the  site  should  be;  (2)  a  protocol  for 
reintroducing  it  into  the  system;  and  (3)  possible  reconfiguration  of  the 
system  after  the  failed  site  has  recovered. 

In  this  project,  it  will  be  assumed  that  failures  are  detected  by 
some  means  (e.g. ,  as  in  the  "local  status  layer"  of  RELNET  [ HAM8 1 3 ) .  The 
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remaining  problems  are  reconfiguration;  atomicity;  information 
requirements;  and  the  post-failure  protocol. 

2.3.1  Reconfiguration 

Reconfiguration  may  not  be  required  if  the  site  algorithms  are  writ¬ 
ten  so  that  the  system  continues  to  function  in  the  srjne  way  in  oite  of 
failures.  In  many  cases,  however,  there  are  "special- purpose"  sites 
(primary  sites,  in  [ALS76]  and  INGRES  [ST077, ST079] ;  spoolers  and  oommit 
backup  processes,  in  SDD-1  [HAM81 ])  whose  functions  must  be  reassigned  to 
other  sites  when  the  special-purpose  site  fails.  The  reassignment  may  be 
fixed  before  the  site  failure,  as  in  [ALS76]  and  [HAMM]  cr  it  may  be 
determined  after  the  site  failure  (e.g. ,  by  a  vote  of  the  live  sites 
[GAR81 ]).  Reconfiguration  following  a  site  reoovery  would  then  involve 
reassigning  a  special  function  to  the  recovered  site  or  possibly  assigning 
it  as  a  backup  for  such  a  function. 

2.3.2  Atomicity 

A  transaction  is  defined  as  a  set  of  primitive  database  operations. 
It  is  required  to  be  an  atomic  unit  of  action,  that  is,  either  all 
operations  of  the  transaction  are  performed  or  none  are.  In  a  distributed 
system,  this  means  that  if  any  site  decides  to  "ooumit"  itself  to  the 
transaction,  then  all  sites  must.  Also,  if  any  site  decides  not  to  perform 
the  transaction,  then  the  remaining  sites  must  agree. 

Site  failure  raises  the  possibility  that  the  failed  site  may  never 
know  what  decision  the  other  sites  came  to.  Conversely,  the  other  sites 
Involved  may  not  know  what  the  failed  site  decided  to  do.  But  the 
atomicity  requirement  means  that  all  sites  must  agree  on  the  decision,  in 
spite  of  failures. 

The  standard  solution  to  this  problem  is  the  "two-phase  oommit" 
protocol  [GRA78].  In  the  first  phase,,  changes  to  the  database  are  made  in 
a  reversible  way.  In  the  second  phase  (the  "oommit"  phase),  when  it  is 
known  that  all  sites  making  changes  are  agreed  to  make  them,  then  changes 
are  made  permanent.  If  any  site  deoides  not  to  make  the  changes,  then  the 
transaction  is  aborted.  The  basic  choices  are  how  to  make  the  changes 
reversible  and  how  to  decide  to  make  the  changes  permanent. 

Changes  may  be  made  reversible  in  two  ways:  by  writing  an  UNDO  log 
entry  before  making  the  changes  or  by  changing  copies  only  until  the 
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decision  to  make  the  changes  permanent  has  be  made.  The  logging  techniaue 
is  discussed  m  „uFA78].  Updating  of  copies  only  Is  used  in  DELTA  [ILj3lJ. 
Reed’s  multiversion  system  [REE783  may  also  be  viewed  as  updating  only 
oopies  until  the  commit  is  made. 

The  deoision  to  make  the  changes  permanent  may  be  made  by  a  vote  of 
all  involved  sites  [GRA78,LEL81 ]  or  by  reaching  the  "normal”  or  "abnormal" 
end  of  a  transaction  [REE78,R0S78].  Which  technique  is  usee  is  related  to 
the  underlying  model  of  transaction  execution.  If  a  transaction  is 
executed  by  a  "transaction  manager"  (SDD-1)  or  a  "produoer"  (DELTA),  whie». 
sends  a  sequence  of  read,  write,  and  commit  oommands  to  the  other  sites, 
then  there  is  a  natural  choice  of  site  to  initiate  and  count  the  vote.  If, 
however,  the  transaction  is  viewed  as  a  process  whioh  migrates  from  site  to 
site,  then  it  is  more  natural  to  let  the  site  at  which  the  transaction 
terminates  (either  normally  or  abnormally)  mnke  the  decision  to  commit  or 
abort  it,  depending  on  the  type  of  termination. 

A  problem  with  the  two-phase  commit  protocol  is  that  the  final 
deoision  to  commit  or  abort  a  transaction  may  be  delayed  until  after  a 
failed  site  has  been  recovered.  For  example,  if  the  failed  site  is  the 
"transaction  manager"  in  SDD-1  or  the  "producer*  in  DELTA,  then  the  count 
of  the  vote  would  be  delayed.  The  system  can  oorrectly  wait  for  the  site 
to  recover,  but  the  delay  may  be  intolerable. 

The  alternatives  are  to  abort  the  transaction  immediately  when  a  com¬ 
ponent  fails;  to  tolerate  the  delay;  or  to  introduoe  a  now  protocol  for 
committing  transactions.  The  third  approaoh  is  taken  in  [SKE81 ],  in  which 
sites  seek  a  concensus  on  committing  or  aborting;  and  in  SDD-1  [BER80],  in 
which  only  a  transaction  may  be  aborted  only  on  a  read,  so  that  once  all 
update  messages  have  been  passed  to  the  guaranteed  delivery  layer  of  the 
message  system,  the  transaction  may  be  committed  in  spite  of  site  failures. 

2.3.3  Tnfowtttion  Requirements 

A  useful  classification  of  the  information  used  in  restoring  a  failed 
site  to  the  system  is  given  in  [GAR81].  He  identifies  three  possibilities: 
no  information  is  used,  a  log  is  used,  or  "persistent  messages"  are  used. 

If  no  information  is  used,  then  the  current  state  of  the  failed  site  must 
be  determined  from  the  states  of  the  active  sites  in  the  system.  Thus 
there  must  be  onough  redundancy  in  the  system  to  allow  determination  of  the 
state  of  one  site  from  some  subset  of  the  other  sites.  If  a  log  is  used, 
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as  in  [QRA78],  the  failed  site  reoovera  by  performing  all  of  the  missed 
actions.  If  "persistent  messages”  are  used,  then  the  oommurloatlons  system 
must  remember  all  the  missed  actions  and  g’j.arar.tse  that  the  failed  site 
receives  th'  a  [HAM81]. 

2.3.4  jtiQQYQTY  ErfltQQfil 

When  a  failed  site  recovers,  it  is  first  brought  up-to-date,  as 
discussed  in  the  preceding  paragraph.  Subsequently,  it  rejoins  the  system, 
possibly  with  reconfiguration  of  the  system.  For  example,  in  [ALS76],  the 
site  must  first  request  to  rejoin  the  system.  This  request  must  be  trans¬ 
mitted,  either  directly  or  through  or  sites,  to  the  primary  site,  which 
informs  all  sites  to  add  the  "new”  site  ';o  their  tables.  In  [LEL81],  when 
a  node  wants  to  rejoin  the  system,  it  munt  get  a  checkpoint  and  all  sub¬ 
sequent  actions  to  bring  it  up-to-date.  Then  it  may  rejoin. 

2.4  Performance  Studies 

To  date,  there  has  been  little  published  work  on  the  performance  of 
concurrency  oontrol  algorithms.  Three  major  exceptions  are  the  work  of 
Garuia-Molina  [GAR78.GAR79] ,  Gelenbe  and  Sevoik  [GEL78],  and  Bernstein  and 
Goodman  [BER80].  Garoia-Molina'a  work  has  fooussed  primarily  on  simulating 
certain  algorithms.  Gelenbe  and  Sevcik  have  suggested  a  queuing  network 
approach  to  determining  two  measures  of  Internal  database  performance  (as 
opposed  to  external  measures  suoh  as  response  time  and  throughput).  Ber¬ 
nstein  and  Goodman  have  also  analysed  many  conourrenoy  oontrol  algorithms, 
comparing  them  according  to  several  internal  measures.  The  underlying 
thesis  of  the  proposed  work  is  that  tne  above-mentioned  work  can  be 
significantly  extended  by  combining  the  approaches.  The  simulation 
experiments  can  suggest  theorems  to  prove  and  provide  examples  of  system 
behavior  o o  explain  by  analytic  methods.  When  analytic  methods  fail, 
simulation  can  be  used  to  clarify  the  behavior  of  the  algorithms.  This 
technique  was  used  with  suooess  in  the  work  on  "ticket  systems"  discussed 
in  section  IV. 

Garcia-Molina  has  done  extensive  simulrMon  of  3  algorithms  (and  some 
variants):  centralized  locking,  distributed  voting,  and  Ellis'  ring 
algorithm.  Other,  significantly  different  algorithms  not  covered  in  his 
work  include  Reed's  multiversion  algorithm,  the  WAIT-DIE  and  WOUND-WAIT 
algorithms,  and  the  SOD-1  algorithms.  His  simulations  were  based  on  a 
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model  that  allowed  only  updates  on  a  fully  redundant  distributed  database 
system.  His  results  show  that  the  centralized  version  has  lower  response 
time  for  ^  but  vnry  heavy  loads;  lower  I/O  utilization,  probably  because 
of  the  redundant  I/O  required  in  a  fully  redundant  system;  and  slightly 
fewer  messages  per  update.  The  cruoial  parameter  in  determining  the 
difference  seemed  to  be  I/O  utilization,  suggesting  that  less  redundancy  in 
the  database  might  produoe  more  favorable  results  for  decentralized 
algorithms.  An  important  factor  of  the  response  time  in  either  oase  is  the 
load,  as  reflected  in  the  transaction  interarrival  time.  The  simulation 
model  used  in  the  work  proposed  here  would  have  to  include  significantly 
more  detail  thf.n  Uarcia-Hoiina’s,  in  order  to  determine  why  a  concurrency 
oontrol  algorithm  oaused  observed  behavior  patterns.  For  the  same  reason, 
some  additional  performance  measures  would  be  of  interest,  such  as  conges¬ 
tion  at  a  node,  blooking,  restarts,  eto. 

Gelenbe  and  Sevolk  have  developed  a  quouing  analysis  technique  for 
evaluating  distributed  database  systems.  Their  measures  of  database  per¬ 
formance  are  the  coherence  (i.e.,  the  degree  of  agreement  of  the  sites  on 
the  value  of  an  entity)  and  the  promptness  (i.e.,  the  average  time  required 
to  update  an  entity  at  a  site).  Their  techniques  are  illustrated  in 
[GEL78]  on  two  rather  special-purpose  database  systems  but  would  also  apply 
to  more  general  databases.  The  analytical  technique  can  be  used  to  help 
validate  the  simulation  results.  The  measures  defined  by  Gelenbe  and  Sev- 
cik  apply  to  the  internal  database  behavior  rather  than  to  a  database 
user's  external  view  of  its  performance.  The  intention  of  the  proposed 
work  is  to  relate  the  performance  of  the  database,  as  seen  by  a  user,  to 
its  internal  behavior,  and  to  relate  the  internal  behavior  to  the  operation 
of  the  concurrency  control  algorithm  in  the  particular  distributed  database 
system.  This  will  relate  the  design  of  the  distributed  database  to  the 
output  of  the  database  and  not  Just  to  its  internal  appearance. 

Bernstein  and  Goodman  have  discussed  the  performance  of  a  huge 
variety  of  concurrency  oontrol  algorithms  in  [BER80].  They  use  four 
measures  which  they  regard  to  be  important  in  the  total  cost  of  concurrency 
control  and  which  can  be  determined  analytically  from  the  algorithms  them¬ 
selves.  These  measures  are:  communication  overhead  (represented  by  number 
of  messages),  looal  processing  overhead,  blocking,  and  restarts.  The 
relationship  of  these  measures  to  response  time  and  throughput  depends  on 
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aaaumptiona  about  how  a  diatributad  system  behaves. 

At  present,  auoh  assumptions  auat  naoaaaarily  ba  ganarallaattona  of 
axparlanoa  with  single-oomputer  ayataaa,  sinoa  thar*  haa  baan  ao  littla 
experience  with  diatributad  ayataaa.  Unfortunately,  it  la  hard  t<»  reaaon 
about  syateos  with  whioh  w«  have  had  littla  axparlanoa.  Aa  a  raault,  many 
seemingly  obvioua  aaaumptiona  and  hypotheses  about  diatributad  ayataaa  may 
prove  wrong.  The  work  to  ba  daaoribad  in  aaotlon  XV  aantiona  two  axamplaa 
of  thla  problem.  The  almulatlon  experiments  that  X  am  proposing  provide 
one  way  to  gain  axparlanoa  with  diatributad  database  ayataaa. 

2.5  aiilmtlaa  TirtainuM 

The  simulation  of  a  diatributad  databaa*  ayataa  oan  ba  don*  using 
conventional  simulation  techniques.  Suoh  an  approaoh  was  taken  for  the 
"ticket  system"  work  dlsouaaed  below.  However,  primarily  for  reaaona  of 
performance,  the  use  of  distributed  simulation  may  be  preferable.  A  number 
of  papers  have  appeared  reoently  on  this  topio  [BRY79,CHA79,PEA79].  The 
primary  problem  with  using  a  distributed  system  for  almulatlon  la  the 
management  of  simulation  time  when  no  shared  variable  "olook"  is  available. 
Chandy  [CHA79]  has  proposed  a  "time-exchange"  system  whioh  requires  each 
process  to  maintain  a  time  on  eaoh  of  its  output  lines,  and  to  take  the 
next  event  from  the  input  line  with  the  lowest  time.  Peaoook,  Hong,  and 
Manning  [PEA79]  have  extended  the  method  of  Chandy  and  devised  other 
methods  as  well,  including  a  "scaled  real-time"  method  in  whioh  simulation 
time  is  simply  scaled  real  time.  Xn  the  terminology  of  Peaoook,  Wong,  and 
Manning,  the  simulation  methods  most  likely  to  be  of  usr  in  this  project 
are  the  "loose  event-driven"  methods  (because  they  should  provide  the  best 
performance)  and  the  "scaled  real-time"  method  (to  assist  in  developing 
intuition) . 
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CHAFTIR  3 

The  Sin  \ation  Tool 


3.1  intEaduBUrtt 

The  objectives  of  this  project  are  (1)  to  develop  an  experimental 
software  tool  for  testing  and  simulating  distributed  systems;  (2)  to  apply 
the  tool  to  distributed  database  systems;  (3)  to  develop  new  solutions  to 
distributed  database  problems  using  the  results  A  the  experiments;  and  (M) 
to  develop  both  experimental  and  analytical  techniques  fot  studying 
distributed  algorithms  in  general. 

The  first  two  objectives  require  development  of  a  model  of 
distributed  database  systems.  This  model  will  of  necessity  inolude  a  sub¬ 
model  of  a  distributed  system.  The  first  objective  —  that  the  tool  be 
applicable  to  distributed  systems  in  general  —  requires  that  the  submodel 
be  separable  from  the  model  and  that  it  be  sufficiently  general  to  allow 
study  of  a  wide  range  of  distributed  system  problems.  Problems  likely  to 
be  addressed  at  Georgia  Teoh  (in  addition  to  database  problems)  are 
distributed  compilation  and  distributed  resouroe  allocation. 

3.2  The  Distributed  Database  Model 

There  are  four  parts  to  the  distributed  database  model:  the  com¬ 
munication  system  submodel  the  distributed  system  submodel,  the  data 
system  submodel,  and  the  user  interface  submodc... 

3.2.1  Hut  flMWiingtilnn  Syataa  fiuhaotitl 

In  the  communication  system  submodel,  it  will  be  assumed  that  point- 
to-point  communication  can  be  described  by  the  following  parameters: 

m  the  delay  time  distribution  function; 
a  the  mean  delay  time; 

a  the  varianoe  in  delay  time  (if  applicable);  and 
a  the  probability  that  a  message  is  lost. 

These  par  meters  may  ohange  dynamically,  to  simulate  line  failures  while 
the  system  is  running.  The  communication  system  submodel  simulates  the 
data  link  and  physical  layers  of  the  ISO  reference  model  of  open  systems 
interconnection  [IS081]. 
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3.2*2  m 9  jiatrlbutad  Svat*m 

Th*  distributad  system  ».  ufcmodel  will  contain  any  required  routing  and 
jrror  rooov«"v  techniques.  It  simulates  the  transport  ana  network  layer* 
of  the  ISO  ''eferenoe  uodul.  For  nsv-y  topolo^ier  to  be  tested  (e.g.,  star 
tree,  and  loop),  ohe  routing  algorithms  should  be  trivial.  Several',  stan¬ 
dard  ones  oan  be  supplied  as  part  of  the  software  too)..  The  distributed 
system  submodel  will  also  oontain  the  characteristics  of  the  system  nodes. 
These  will  inolude  the  following  parameters: 

•  che  node  stop  time; 

•  the  access  time  to  secondary  memory; 

•  the  node  memory  size;  and 

•  the  aeoondary  memory  size. 

3.2.3  The  Data  System  Submodel 

The  data  system  submodel)  will  contain  "data  managers"  and  the  user 
interface  submodel  will  contain  "transaction  managers",  as  in  the  Bernstein 
and  Goodman  model  of  distributed  database  systems  [BER80].  Operations  per¬ 
formed  by  the  data  system  submodel  are: 

e  read  a  data  granule  (item,  reoord,  page,  etc.); 
e  write  a  data  granule; 
e  lock  a  data  granule; 
e  unlock  a  data  granule; 
s  read  a  timestamp  for  a  data  granule; 
e  set  a  t line t> tan p  for  a  data  granule; 
e  commit  a  data  granule. 

The  definition  of  data  granule  is  similar  to  the  definition  of  Ries  and 
Stonebraker  [FIE77].  It  specifies  the  smallest  unit  of  data  that  oan  be 
locked  and  unlov.1.  -  vfor  concurrency  control),  read  and  written  (for  query 
processing),  or  written  to  secure  storage  (for  reliability).  To  permit 
study  of  algorithms  in  which  the  transaction  managers  do  not  know  where 
data  may  be  stored  —  only  the  data  managers  know  where  it  is  —  the  data 
managers  will  be  allowed  to  oommunicate  with  each  other.  To  permit  study 
of  algorithms  assuming  that  transactions  may  be  passed  from  site  to  site, 
the  transaction  managers  will  also  be  allowed  to  communicate. 
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3.2.4  lbs.  uaer  interface  Subaoflcl 

The  user  interface  submodel  will  process  the  transactions.  Transac¬ 
tions  are  identified  by  speoial  delimiting  statements  at  the  beginning  and 
end.  The  statements  inside  a  transaction  may  be  any  sequence  of  data 
manager  operations. 
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Figure  1 .  A  Schematic  of  the  Distributed  Database  System  Model 


3.3  £xaten  Architecture  and.  Spcclflcatlona 

The  proposed  experimental  tool  will  contain  a  module  corresponding  to 
each  of  the  submodels  discussed  in  the  preceding  section.  Parameters  may 
be  specified  independently  for  each  module  and  algorithms  may  be  plugged 
into  the  appropriate  module. 

3.3.1  ftuto.ut  Analysis 

In  addition,  the  results  of  the  simulation  must  be  tabulated.  To 
accomplish  this  purpose,  each  system  action  (i.e.,  message  or  access  to  a 
database)  will  be  logged.  The  log  will  be  used  to  compute  the  following 
basic  measures: 

•  expected  response  time; 

•  throughput; 

•  utilization;  and 

s  queue  length  at  each  node. 

Expected  response  time,  throughput,  and  queue  lengths  can  be  computed  on 
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the  basis  of  the  user  interfaoe  module  output.  Utilization  must  be  com¬ 
puted  from  information  recorded  by  the  oommunloation  system,  distributed 
system,  and  data  system  modules. 

Secondary  measures  whose  relationship  to  the  primary  measures  will  be 
of  interest  are: 

•  number  of  messages; 

•  number  of  bits  sents 

•  number  of  errors  in  transmission; 

•  number  of  nodes  (dispersion)  required  by  a  transaction  or  query; 

e  number  of  nodes  actually  used  in  responding  to  a  transaction  or 
query ; 

•  local  processing  overhead;  and 

e  I/O  time. 
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